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ABSTRACT 

The  problem  of  minimax  robust  source  coding  under  a  fidelity  criterion  for  sources 
whose  statistics  belong  to  uncertainty  classes  determined  by  2-  alternating  Choquet 
capacities  is  examined.  We  consider  (i)  single-letter  difference  distortion  criteria  for 
discrete  memoryless  sources  whose  probability  distributions  belong  to  capacity  classes 
and  (ii)  the  mean-  square  error  distortion  criterion  for  stationary  Gaussian  sources  whose 
spectral  measures  belong  to  capacity  classes.  Both  block  source  codes  and  trellis  source 
codes  are  considered.  It  is  shown  that  there  exists  an  ensemble  of  block  source  codes 
and  an  ensemble  of  trellis  codes  such  that  for  all  rates  larger  than  a  critical  rate  and  all 
sources  in  the  class  the  average  distortion  converges  to  any  prescribed  fidelity  level 
exponentially  with  increasing  block  length  or  constraint  length,  respectively.  Besides  the 
rate  distortion  function,  the  distortion  exponent  of  the  class  is  also  evaluated. 
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I.  INTRODUCTION 

For  sources  whose  statistical  description  (i.e.,  the  probability  distribution  which 
governs  the  source  statistics)  is  known,  Shannon  [1]  in  his  renowned  source  coding 
theorem  showed  that,  if  the  code  rate  is  larger  than  a  critical  rate  (termed  the  rate  dis¬ 
tortion  function),  we  can  by  using  block  source  codes  represent  the  source  in  any  given 
alphabet  and  asymptotically  satisfy  a  fidelity  (distortion)  criterion.  Furthermore  it  was 
shown  (e.g.  [2],  [3])  that  the  asymptotic  convergence  to  a  given  distortion  level  is 
exponential  in  the  code  length.  Similar  results  for  trellis  source  codes  were  established 
in  [4]  and  [3], 

For  sources  whose  statistics  are  not  perfectly  known  but  the  determining  quantity 

(e.g.,  the  probability  distribution)  belongs  to  a  class  the  rate  distortion  function  was 

found  in  [5]  to  be  supi?„  ( D  )  whose  S  is  the  class  of.  probability  distributions  of  the 
s&S 

sources  and  R8  ( D  )  is  the  rate  distortion  function  at  a  distorition  level  D  for  a  particular 
member  s  of  the  class. 

As  far  as  coding  is  concerned  two  approaches  have  been  followed.  According  to  the 
first  approach  termed  universal  coding  and  described  in  [6]  -  [10]  (we  have  not  attempted 
to  compile  a  complete  listing  of  all  the- papers  on  the  subject)  the  source  is  represented 
or  approximated  as  a  finite  composite  of  stationary  ergodic  subsources  and  a  union  code 
is  formed  from  the  codes  which  are  optimal  for  these  representative  subsources.  Then, 
all  sources  in  the  class  have  asymptotically  optimal  coded  performance.  This  approach 
is  applicable  to  a  large  number  of  cases  (e.g.,  general  alphabets  and  distortion  measures), 
since  it  is  independent  of  the  source  statistics.  Two  disadvantages  of  this  approach  are: 
(i)  a  large  number  of  representative  subsources  may  be  necessary  and  (ii)  the  construc¬ 
tion  of  the  representative  subsources  for  a  given  class  can  be  very  complicated. 
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The  second  approach  termed  minimax  robust  coding  is  based  on  a  worst-case 
design.  The  least-favorable  source  is  singled  out  and  we  subsequently  code  for  it.  Then 
the  average  distortion  approaches  any  prescribed  fidelity  level  exponentially  with  increas¬ 
ing  block  length  for  all  sources  in  the  class.  The  disadvantage  of  this  approach  is  that 
the  asymptotic  coded  performance  is  not  optimal  for  all  but  the  least-favorable  source  in 
the  class.  However,  this  approach  is  attractive  because  it  requires  only  one  representa¬ 
tive  source  for  the  class  (the  least-favorable  one)  which  can  be  explicitly  found  in  several 
interesting  cases.  This  approach  was  considered  in  [ll]  for  some  different  classes  of 
sources  than  those  considered  in  this  paper.  For  stationary  Gaussian  sources  with  spec¬ 
tral  uncertainty  within  classes  similar  to  those  considered  in  this  paper  the  rate  distor¬ 
tion  function  over  the  class  was  derived  in  [12]  but  no  minimax  source  coding  theorems 
were  established.  Finally,  minimax  noiseless  block  source  coding  was  considered  in  [13], 
and  [14].  Again,  no  effort  to  compile  a  complete  listing  of  all  the  papers  on  the  subject 
has  been  made. 

In  this  paper  we  apply  the  minimax  coding  approach  for  block  and  trellis  source 
codes  which  satisfy  a  fidelity  criterion  to  discrete-  memoryless  sources  (DMS’s)  and 
discrete-time  stationary  Gaussian  sources  (SGS’s)  which  belong  to  classes  determined  by 
2-alternating  Choquet  capacities  [15].  Our  choice  of  these  uncertainty  models  is  justified 
in  two  ways.  First,  important  uncertainty  models  like  contaminated  mixtures  [16],  total 
variation  neighborhoods  [16],  band  models  [17]  -  [18],  and  extended  p-point  models  [18] 
are  capacity  classes  and  have  played  an  important  role  in  hypothesis  testing  [19]  and 
filtering  [20].  Second,  the  least-favorable  sources  can  be  explicitly  found  for  the  uncer¬ 
tainty  classes  described  by  any  of  the  above  models.  In  this  paper  we  restrict  attention 
to  DMS’s  and  discrete-time  SGS’s  (continous-time  SGS’s  are  also  discussed),  because 


these  are  the  simplest  nontrivial  cases  of  interest  with  which  our  techniques  can  be 
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illustrated.  Our  results  can  be  extended  to  other  classes  of  sources;  e.g.,  homogenous 
first-order  Markov  sources.  Similar  results  for  the  dual  minimax  robust  channel  coding 
problem  are  described  in  [21]. 

This  paper  is  organized  as  follows.  Minimax  robust  coding  under  a  fidelity  criterion 
is  discussed  in  Section  II  for  difference  distortion  criteria  and  discrete- memory  less  sources 
with  uncertainty  in  the  probability  distribution,  and  in  Section  III  for  the  mean-square- 
error  distortion  criterion  and  stationary  Gaussian  sources  with  spectral  uncertainty.  In 
each  of  these  sections  we  first  present  the  uncertainty  models  that  we  consider  and  intro¬ 
duce  the  necessary  notation.  Next,  we  formulate  the  mismatch  source  coding  problem 
and  establish  the  appropriate  coding  theorems  for  both  block  and  trellis  source  codes. 
Finally,  we  derive  the  coding  theorems  for  minimax  robust  source  coding  over  the  uncer¬ 
tainty  class.  In  particular,  we  show  that  there  exist  an  ensemble  of  block  source  codes 
and  an  ensemble  of  trellis  source  codes  such  that,  provided  the  code  rate  is  larger  than  a 
critical  rate,  the  average  distortion  converges  exponentially  to  any  prescribed  fidelity 
level  with  increasing  block  length  or  constraint  length,  respectively,  for  all  sources  in  the 
uncertainty  class.  Then,  in  Section  IV  a  brief  summary  of  this  paper  and  some  conclu¬ 


sions  are  presented. 
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n.  ROBUST  CODING  FOR  DISCRETE  MEMORYLESS  SOURCES 

A.  Uncertainty  Classes  Generated  by  Choquet  Capacities 

Suppose  that  U  is  the  source  alphabet,  V  is  the  representation  (or  user)  aiphabet, 
and  F,  G  are  the  cr-algebras  generated  by  subsets  of  U  and  V,  respectively.  A  discrete 
memoryless  source  is  characterized  by  the  probability  measure  Q  (A  )  A  £F.  We  assume 
that  the  probability  measures  Q  are  only  known  to  lie  in  a  convex  class  generated  by  a 
Choquet  2-aiternating  capacity  [15] 

Qw  =  {Q€Q  |  Q{A)  <  w(A),  A€F}  (l) 

where  Q  denotes  the  class  of  all  probability  measures  on  (t/,F),  and  it;  is  a  2-alternating 
capacity  on  ((7, F)  with  w(U)—l. 

A  Choquet  2-alternating  capacity  [15]  on  (U , F)  is  a  finite  set  function,  which  is 
increasing,  continuous  from  below,  continuous  from  above  on  closed  sets,  and  satisfies 
w  (00)  =  0  and  w  ( A  [JB )  +  w  (A  p)£? )  <  u;  (A  )  +  it;  {B )  for  all  A  ,B  £F.  Notice  that 
any  finite  measure  w  is  a  2-alternating  capacity;  in  this  case  the  uncertainty  class  gen¬ 
erated  by  (1)  reduces  to  Qw  —  {w  }.  If  we  further  assume  that  U  is  compact  then  all 
the  uncertainty  models  mentioned  in  the  Introduction  are  capacity  classes.  If  U  is  not 
compact  [e.g.,  U  —  (-00,00)]  only  the  band  model  can  be  defined  in  terms  of  a  capacity. 

Examples  of  2-alternating  capacity  classes  are:  the  band  model  [17]  defined  by 

QWl  =  {QeQ\  Qo(A)  <  Q(A)<  QX(A  ),  VAG F},  (2a) 

where  Q 0  and  Q 1  are  known  measures  (not  necessarily  probability  measures)  with 
Qo(U)  <  1  <  Qj([/);  the  e-contaminated  mixtures  model  [16]  defined  by 


Qw2=  {Qeq  1  Q(A)  =  (i-e)g0(A)  +  eP (A ),  vae f,  PE Q}, 


(2b) 
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where  Q0  is  a  known  probability  measure  and  the  number  e  in  (0,1)  is  the  degree  of 
uncertainty  in  the  model;  and  the  total  variation  model  [16]  defined  by 

Qws=  {QeQl  I  Q(A)-  Qo(A)|  <eVAe F}.  (2c) 

where  Q0  is  a  known  probability  measure  and  the  number  e  in  (0,1)  is  again  the  degree 
of  uncertainty  in  the  model.  Then  (2a)  -  (2c)  can  be  expressed  in  the  form  (1)  if  we  set 

ma(A  )  =  min{l-Q0(Ac ),  Qt(A  )}  (3a) 

for  the  band  model, 

w2(A  )  =  (l-e)Q0(v4  )  +  e  (3b) 

for  the  e-contaminated  model,  and 

wz{A  )  =  min{<30(A  )  +  e,  1}  (3c) 

for  the  total  variation  model.  See  (18)  for  a  description  of  the  p-point  capacity  class. 

In  the  sequel  we  will  need  the  following  fundamental  result  due  to  Huber  and 
Strassen  [19]: 

Lemma  1:  If  w  is  a  2-alternating  capacity  on  (U  ,F),  Qw  is  a  convex  class  of  probability 
measures  determined  by  it  as  in  (1),  and  X  is  the  Lebesgue  measure  on  U ,  then  there 
exists  a  unique  Lebesque  measureable  function  irw :  U— *-[0,oo]  with  the  defining  property 
that  for  all  x£[0,oo]  and  Ax  defined  by  Ax  =  {nx  >  x} 

x  X(Aj. )  +  w  (A*)  <  x  X(A  )  +  w  (A  c ),  VA  eF.  (4) 

Furthermore  there  exists  a  measure  Q  in  Qw  such  that  for  all  a:£[0,oo]. 

Q{{vw  <  x})  =  w({irw  <  a;}),  (5) 


which  means  that  Q  makes  irw  stochastically  smallest  over  all  Q  in  Qw  ,  and  ttw  is  a 


version  of  dQ/d\,  the  generalized  Radon-Nikodym  derivative  of  Q  with  respect  to  X; 
that  is,  dQ  / d  \  may  be  infinite  on  sets  of  X  measure  0. 

The  function  ttw  is  termed  the  Huber-Strassen  derivative  of  w  with  respect  to  X  (w 
may  not  be  a  measure).  The  probability  measure  Q  singled  out  by  Lemma  1  is  termed 
the  least-favorable  measure  of  the  class  Qw.  Let  Q  =  Q'  +  Q'  '  be  the  Lebesgue 
decomposition  of  Q  ,  where  Q'  is  absolutely  continous  with  respect  to  X  and  Q'  '  is 
singular  with  respect  to  X  (that  is,  it  concetrates  all  its  mass  on  sets  of  X  measure  0). 
Then, 

Q<  (A  )  =  fAnwd\  (6a) 

and 

Q '  1  (A  )  =  w(A  p|  {tt^  =  oo}),  (6b) 

for  all  A  £EF.  For  the  band  model  the  Huber-Strassen  derivative  irw  —  q  is  defined  as 

q^u)  =  max{<70(«  ),  min{c,  q^u)}},  (7a) 

where  qj  —  dQj/d\  is  the  Radon-Nikodym  (R-N)  derivative  of  Qj  of  (2a)  for  j  ==  0,1 
and  c  is  chosen  so  that  Q-fU)  —  1.  For  the  e-contaminated  model  the  corresponding 
definition  is 

q2{u)  =  max{(l-e)</0(u  ),  c  },  (7b) 

where  q0  is  the  R-N  derivative  of  Q0  of  (2b)  and  c  is  chosen  so  that  Q2(U)  =  1.  Simi¬ 
larly  for  the  total  variation  neighborhood  model  we  have 

</3(u)  —  max{c'  ,min{c'  '  ,  <7 0( «  )} } ,  (7c) 

where  q  0  is  the  R-N  derivative  of  the  Q0  of  (2c)  and  c'  ,  c'  '  are  chosen  so  that 
CUU)  =  1.  See  [18]  for  the  definition  of  q  for  the  p-point  class  model. 
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It  should  be  noted  that  Huber-Strassen  derivatives  of  generalized  capacities  [a  gen¬ 
eralized  capacity  is  defined  in  the  same  way  as  a  2-alternating  capacity  except  that  it  is 
required  to  be  continuous  from  above  on  compact  (and  not  just  closed)  sets]  with  respect 
to  cr-flnite  (and  not  just  finite)  measures  can  be  constructed  [22,  Chapter  IV],  One  of  the 
implications  of  this  extension  is  that  several  of  the  most  useful  examples  of  capacity 
classes  (e.g.,  e-mixtures, variation  neighborhoods)  are  generalized  capacities  when  U  is  a- 
compact  (and  not  just  compact).  Then,  if  U  is  cr-compact  and  thus  \  is  cr-finite,  Lemma 
1  still  holds. 

For  the  proofs  of  the  minimax  robust  coding  theorems  in  Section  II. C  below  we 
need  to  assume  that  the  least-favorable  measure  Q  of  the  capacity  class  Qw  is  abso¬ 
lutely  continuous  with  respect  to  the  Lebesgue  measure  X  on  U  (i.e.,  Q  «  X).  This 
assumption  is  satisfied,  provided  that  Q  x  «  X  for  the  hand  class  and  Q0  «  X  for  the 
econtaminated  and  total  variation  neighborhood  classes. 

We  would  also  like  to  mention  at  this  point  that  if  U  is  a  discrete  alphabet  then 
q  (u  )  for  u£(/  becomes  a  probability  mass  function  (pmf)  and  all  the  results  involving 
the  capacities  described  above  still  hold,  provided  that  we  replace  the  integrals  with 
respect  to  u  with  sums  and  the  Radon-Nikodym  derivatives  with  pmf’s.  This  duality 
becomes  possible  if  we  replace  the  Lebesgue  measure  on  U  (in  the  continuous  case)  with 
the  measure  which  assigns  equal  mass  to  all  the  elements  of  U  (in  the  discrete  case)  in 
Lemma  1  and  apply  the  Huber-Strassen  theory  to  this  case.  See  [23]  for  a  more  exten¬ 
sive  discussion  of  this  duality.  Therefore,  in  the  sequel  we  will  be  working  with  continu¬ 
ous  amplitude  sources,  pdf’s,  and  integrals  ,but  the  results  will  still  be  valid  for  the 


corresponding  situations  with  discrete  amplitude  sources. 
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B.  Mismatch  Source  Coding  Theorems  for  Block  and  Trellis  Codes 

In  this  paper  we  consider  DMS’s  with  V  =  (-00,00)  and  single-letter  difference  dis¬ 
tortion  measures  d(u,v)=  d(v-u),  u£U ,  t)G7  which  satisfy  the  bounded  variance 
condition 

jv  d\u)dQ(u)  <  d0  (8) 

for  some  finite  positive  number  d0  and  all  Q  in  Qw  .  Let  D  be  the  maximum  level  of 
average  distortion  per  letter  that  we  can  tolerate  when  we  represent  letters  from  the 
source  alphabet  U  with  letters  from  the  user  alphabet  V .  We  define  the  average  distor¬ 
tion  per  letter  associated  with  a  difference  distortion  criterion  d(  )  which  satisfies  (8),  a 
conditional  probability  density  function  (pdf)  p  (v  |  u  ),  and  a  source  with  probability 
measure  Q  in  Qw  by 

D(p  ,Q)  =  Ju  Jv  p(v  |  u)d(u  ,v)\(dv)dQ(u).  (9) 

Furthermore,  if  we  assume  that  p  (v  |  u)  —  p  (v -u)  for  u£U,  v£V,  (9)  is  equivalent 
to 

D  (p  ,Q)  —  D  (p)  Q'  (U)  <  D{p)  =  fv  d(z)p{z)\(dz).  (10) 

Equation  (10)  follows  from  (9)  after  performing  the  substitution  z—v-u  and  taking  into 
account  the  facts  that  V  —(-00,00)  and  Q'  ((/)  <  Q{U)  —  1.  Notice  that  in  (10) 
the  average  distortion  D  (p  ,Q)  is  upperbounded  by  D(p)  which  does  not  depend  on  Q  . 
This  fact  will  be  critical  for  establishing  the  results  of  the  next  Sections.  The  need  for 
the  aforementioned  assumptions  will  become  clear  in  subsection  II. C  (during  the  proof  of 
the  main  results  stated  in  Theorems  3  and  4). 

Suppose  now  that  in  the  presence  of  uncertainty  about  Q  the  user  mistakenly 
assumes  that  (or  attempts  to  estimate  Q  and  comes  with  an  estimate  that)  Q  is  the 
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EoiP.P  ,~qiQ)  —  -  In  {/  —7-T  [  L  ~P  («  |  v ) 1  +  p  ~P  (t>  )X(</w  )]*  +  p  dQ  (it )}  (15a) 

u  ?(«)  K 

1  /> 

=  -/n  {/„(./>(«  l«)1+p  [/„  /»(« l«'  r?(u'  )M^'  )i  x(^)]i  +  ^  dQ(u)} 

(15b) 

and  d0  was  introduced  in  (8).  For  this  theorem  to  be  valid  it  is  required  that 
I (p  ,~q  ;Q)>0  and  E0(p,p  ~q \Q  )<0  for  all  p  in  [-1,0].  However,  these  inequalities  are 
satisfied  for  all  nontrivial  p  ,  ~q  ,  and  Q  . 

Remark  1.  The  quantities  I(p,~q',Q)  and  E  0(p,p  ,~q ',Q )  represent  the  mismatch  mutual 
information  function  and  the  mismatch  distortion  exponent,  respectively. 

Remark  2.  We  consider  Theorem  1  and  Theorem  2,  which  follows,  important  in  two 
ways:  as  being  fundamental  intermediate  results  necessary  for  the  proof  of  the  main 
Theorems  3  and  4  below,  and  as  interesting  independent  results  which  characterize 
source  coding  with  a  fidelity  criterion  (for  both  block  and  trellis  codes)  in  the  case  of 
mismatch  (i.e.,  when  the  actual  probability  measure  of  the  source  is  different  than  the 
estimate  employed  in  the  encoding  procedure). 

Proof:  Our  proof  basically  follows  from  a  modification  of  the  proof  of  the  source  coding 
theorem  for  the  matched  case  ~q~q  (Q  =Q  has  a  pdf  in  this  case)  given  in  [3,  Sections 
7.5.1  and  Lemma  7.2.2].  Thus  we  only  present  here  these  points  of  the  proof  of  [3] 
which  were  considerably  modified.  It  was  shown  in  [3]  that  we  can  express  the  average 
distortion  Dc  achieved  using  a  particular  code  C ={vv  v2,  .  .  .  ,  vM }  where 
vm  £Vn  ,  m=l,2,...,M ,  and  M  —  [e  nR  ]  as 
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Dc  <D(p,Q)  +  d0[juJvn  Pn( v  \u)B(u,v-,C))\n(v)dQn(u)]2  (le) 


In  (16)  dQn(u)  =  n  dQ(ui)’  Pn(H  I  «)  ==  IT  P(vi  I  ui)  for  u£Un,veVn, 

i  =  1  t  =1 

n 

\„(du)  =  JJ  \(</u,),  and 

»  =i 


B (u  ,v  ,C )  = 


1;  dn(u,v)  <  min  dn(u,v'  ) 
o'  ee 

0;  dn{u,v_)  >  min  dn(u,v'  )' 
v'  ec 


(17) 


Next  we  proceed  to  bound  Dc  the  average  of  Dc  over  the  ensemble  of  block  source 
codes  described  in  Theorem  1.  The  members  of  this  ensemble  are  assigned  the  product 

M  n 

distribution  P(C)=  ]^[  pn  (vm  )  where  pn(v)  =  J1  ~p  (v{ )  and 


m  =1 


p(v)  =  L  p(v  |  u  )q  (u  )\(du  )  for  v  EV . 


i  =i 


Then, 


since 


~pn  (u  |  u)  =  pn(v  |  u)~qn{u)/~pn{v),  we  can  write  Ec  the  quantity  inside  the  brackets  in 
(16)  as 


Ec  =  Iun  S'-  ~[jy.  Pn(«  \v)B(u,v-,C)~pn{v)\n(dv) 
9n  ) 


dQnilO, 


(18) 


and  apply  for  p  in  [-1,0]  Holder’s  inequality: 


f  !  9d  P  <  (/  /  a<1p)  “  (/  90(1p)  0  - 


(19) 


where  l<o:<oo,  l</?<oo,  and  a  1  +  /T1  =  1,  for 

/  —  Pn  («  |  v),  a  =  1/(1  +  p),  g  =  B{u  ,v;C),  (3  =  1 /(-/>),  and  d  p  =  p„{v)  \n(dv) 
to  obtain 


<  /  — i— 
“  V  ?»(«) 


fyn  ~Pn(U  \v)1+Ppn{v)\n{dv ) 


1  +  p 
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[jv%  B(u,v-,C)~pn(v)\(dv)YdQn{u )  (20) 

Averaging  Ec  over  the  code  ensemble  and  applying  Jensen’s  inequality  yields 

Ec  <  exp  £-n£'0(p,p  ,q  ,Q )  ]  [/K„  B  («  ,v;C)~pn  ( v  )X„  (dv )  j  “  (21) 

where  X  denotes  averaging  X  over  the  code  ensemble.  In  [3,  p.393]  it  was  shown  that 

fyn  B(u,v;C)~pn(v)\n(dv)  <  M-1  <  e~nR  .  (22) 

Finally,  by  using  inequalities  (10)  and  (11),  averaging  (16)  over  the  ensemble  of  block 
source  codes,  and  substituting  from  (21)  and  (22)  we  obtain  (14).  Then,  condition  (12), 
definition  (13),  and  the  aforementioned  positivity  and  negativity  requirements  follow 
from  the  requirement  that  the  exponent  E 0(p,p  ~q \Q)-pR  be  strictly  positive  for  p  in  [- 
1,0]  and  the  fact  that 

HP  ,~q,Q)  =  E  o (p,p  ,q  I Q  )/dp  j  ^  (23) 

in  the  same  way  as  for  the  usual  source  coding  theorem  (e.g.  [3,  Lemmas  7.2.2  and  7.2.3]) 

We  would  like  at  this  point  to  emphasize  the  necessity  of  the  assumption  that  Q  is 
absolutely  continuous  with  respect  to  X  (i.e.,  Q  «  X).  This  was  essential  in  deriving 
equations  (18),  (20),.  (21)  and  (22). 

We  now  show  that  the  positivity  and  negativity  requirements  on  I (p  ,qtidle  ;Q  ) 
and  E  0(p,p  ~q  ]Q)  for  all  p  in  [-1,0],  respectively,  are  satisfied  for  all  pairs  of  ( ~q  ,Q ).  To 
show  the  positivity  of  I{p,~q;Q )  we  only  need  to  apply  the  inequality 
Inx  >  1  —  x~  (x  >0)  and  use  the  fact  that  p  (v  j  u  )  and  p  (v )  are  pdf’s.  To  show  the 
negativity  of  E 0{p,p  ,~q  ,Q  )  given  by  (15a)  for  all  p  in  [-1,0]  we  first  apply  Holder’s  ine¬ 
quality  [see  (19)]  for  /  —  p  (u  |  v),  a  =  1/(1+/?),  g  —  1,  fd  =  1  /(-p),  and 
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dp  =  ~p  (v  )\(dv ),  then  we  multiply  both  members  of  the  resulting  inequality  by 
l/~q  (it),  we  integrate  over  U  with  respect  to  dQ(u),  and  finally  take  the  negative  loga¬ 
rithm. 

Similarly,  for  trellis  source  codes  used  as  described  in  [3,  Section  7.4])  with  the 
necessary  modifications  for  the  case  of  mismatch,  the  following  result  holds: 


Theorem  2:  Under  the  assumptions  of  Theorem  1,  consider  the  ensemble  of  trellis  codes 

of  constraint  length  K  and  rate  R  —  —  In  M  nats  per  source  symbol  satisfying  (12) 

n 

which  is  generated  by  assigning  N  letters  from  the  alphabet  V  independently  and 
according  to  ~p(v)  —  JaP(v  \  u  )~q  (u  )\(du  ),  v  £V ,  to  the  branches  of  the  trellis. 
Then,  the  average  distortion  Dc  over  the  ensemble  of  trellis  source  codes  is  upper- 
bounded  by  DK(p,p  ~q \Q  )  given  by 


Dk(P’P  ~q  >Q)  —  D  + 


d0M ' 


AK-x)p 


1  -  M 


-\E0(P’P  ■?  ,Q)/R  -P 


(24) 


where  -1  <  p  <  E0{p,p  ~q  \Q  )/R  . 


Proof:  It  is  a  modification  of  the  proof  of  the  source  coding  therem  for  trellis  codes  (see 
[3,  Section  7.5.2])  for  the  matched  case  ( ~q  —  q )  which  takes  into  account  the  mismatch 
arguments  established  during  the  proof  of  Theorem  1;  therefore  we  do  not  repeat  it  here. 


C.  Minimax  Robust  Source  Coding  Theorems  for  Block  and  Trellis  Codes 

In  this  section  we  assume  that  the  probability  measure  which  governs  the  statistics 
of  the  source  is  only  known  to  lie  in  a  class  of  the  form  (l)  described  in  Section  II. A. 
The  source  encoder  employs  a  measure  Q  in  a  way  described  in  Theorems  1  and  2.  The 
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goal  is  to  choose  Q  so  that  for  all  code  rates  larger  than  a  critical  rate  the  probability  of 
erroneous  representation  approaches  zero  with  increasing  blocklength  for  all  sources  in 
the  class. 

To  prove  the  main  results  of  this  section  we  need  to  assume  that  the  following  con¬ 
dition  is  satisfied: 

Condition  C0  The  conditional  pdf  p  which,  for  a  given  probability  measure  Q  and  the 
corresponding  density  ~q  ,  minimizes  I(p~q\Q )  under  the  constraint  (11)  is  of  the  form 
p(v  |  u  )  —  p  (v  -u  )  for  all  u  £  U  and  vEV . 

What  we  really  require  with  condition  C0  is  that 

arg  min  I(p  ,~q  ,Q)  =  arg  min  I(p,~qiQ)  (9c\ 

p€  PD  Pe  Pq 

Where  " 

Pd  =  {pGP|£>(p)<£>}, 

P  D  ~  {.P  €P  |  p  (t>  |  u)  =  p  (v-u),  V(u  ,v  )EUxV ,  D  (p  )  <  D  }, 

and  P  is  the  class  of  all  conditional  pdf’s.  Therefore,  if  condition  C0  is  satisfied,  we  only 
need  to  consider  the  smaller  class  Pp  for  which  the  average  distortion  D  (p  ,Q)  is  upper- 
bounded  by  D(p)  [see  (10)].  To  restrict  attention  to  minimizations  over  the  smaller 
class  Pp  turns  out  to  be  necessary  for  Theorems  3  and  4  below  to  be  valid.  Condition 
C0  is  not  so  restrictive  when  we  consider  difference  distortion  criteria  and  continuous 
amplitude  sources  with  V  —  (-00,00).  For  example,  discrete  memoryless  Gaussian 
sources  with  uncertainty  in  their  probability  distribution  described  by  capacity  classes 
satisfy  the  above  condition  and  so  do  the  stationary  Gaussian  sources  with  spectral 
uncertainty  considered  in  Section  Ill. 
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We  now  state  and  prove  the  main  results  of  this  section: 

Theorem,  8:  Suppose  the  probability  distribution  Q  belongs  to  a  class  of  the  form  (1) 
and  Q  (let  Q  «  X  and  q  =  dQ  / d\)  is  the  element  of  the  class  singled  out  by 
Lemma  1.  Then  the  following  inequalities  are  true  for  all  conditional  pdf’s  p ,  all  Q  in 
Qw,  and  all  p  in  [-1,0]: 

Hp  A\Q)  <  l{p  A  \Q)  <  Up  .q.Q)  (26) 

and 

E0(p,p  ,q-,Q)>  E0(p,p  ,q  ;Q).  (27) 

Suppose  further  that  condition  C0  is  satisfied  for  any  fidelity  level  D  .  We  consider  pdf’s 
p  in  the  set  Pp  described  above.  Then  the  operating  point  (p  ,q)  where 
p  =  arg  min  I {p  ,q;Q)  for  p  satisfying  D(p)  <  D  and  the  source  determined  by  Q  » 

p 

form  a  saddle  point  for  min  ma x  I(p  ,~q  ;Q)  under  the  fidelity  constraint 

(p .? )  Q 

D  (p  ,Q)  <  D  (p)  <  D  ;  i.e., 

I(P >q',Q)  <  HP <  Hp  ,q.Q)-  (28) 

The  triple  ( p  ,q  ,Q  )  also  represents  a  least-favorable  operating  point  for  the  distortion 
exponent  as  eq.  (27)  applied  for  p  —  p  indicates.  Finally,  the  condition 

R  >I(p  ,q;Q)  (29) 

is  sufficient  and  necessary  to  guarantee  that  for  the  ensemble  of  block  source  codes  of 
length  n  and  rate  R  determined  by  q  (as  described  in  Theorem  1,  just  set  ~q  ~  q)  the 
average  distortion  converges  to  the  fidelity  level  D  exponentially  with  increasing  n  for 


all  sources  in  the  class. 
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Remark  8.  The  quantity  I(p,q,~Q)  represents  the  rate  distortion  function  of  the  class 
defined  by  (1). 

Remark  4 ■  Besides  the  inequality  in  (28)  the  following  inequality  also  holds 

HP  ,q,Q)  <  HP  ,q,Q)  <  Hp  <q,Q)  (30) 

under  the  fidelity  constraint  D  (p  ,Q)  <.  D  {p)  <.  D  ,  which  implies  that 

l(P.q,Q)  =  max  min  I(p  ,q  ;Q)  for  Q  «  X,  that  is,  the  rate  distortion  function  for 
Q  p 

the  source  determined  by  Q  is  the  worst-case  rate  distortion  function  over  all  sources  in 
Qw  [in  the  usual  notation:  Rq  ( D  )  =  sup  Rq  ( D  )]. 


Proof:  We  first  prove  the  inequalities  (26)  and  (27).  To  prove  the  right-hand  side  ine¬ 
quality  in  (26)  we  use  Jensen’s  inequality  to  show  that  I(p  ,q\Q)  -  I(p  ,q  :Q )  <  0.  For 
the  left-hand  side  inequality  in  (26)  we  notice  that  it  is  equivalent  to: 

JaG(q)dQ  <  fv  G  (q  )dQ  ,  (31) 

where  G(q)  =  J  p  (v  |  u  )ln - SSfL  1  u  2 -  \(dv).  Since  G(q)  is  a 

Ja  Hu'  )P  (v  \  u'  )\(du '  ) 

decreasing  function  of  q  —  irw  ,  and  according  to  Lemma  1  Q  makes  7r,„  stochastically 
smallest  over  all  Q  in  Qw,  the  inequality  in  (31)  is  satisfied.  Similarly,  to  prove  the  ine¬ 
quality  in  (27)  notice  that  it  is  equivalent  [via  (15b)j  to  the  inequality 


j  H(q)dQ  <  j  H(q)dQ 


(32) 


,x  +  P 


i  +  p 


\(dv ) 


How- 


where  II  (q)  =yjyP  (v  |  u  ) 1  +  p  q  (u  '  )p  (v  j  u  '  )\{du  '  ) 

ever,  since  G(q)  is  a  decreasing  function  of  q  nw ,  for  p  in  [-1,0]  and  according  to 
Lemma  1  Q  makes  ttw  ,  stochastically  smallest  over  all  Q  in  Qw,  (32)  is  satisfied  and  so 
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is  (27). 

Next  we  prove  inequality  (28).  The  left-hand  side  inequality  in  (28)  follows  from 

the  left-hand  side  inequality  in  (26)  for  p  =  p  .  Then,  the  right-hand  side  inequality  in 

(28)  follows  from  the  fact  that,  because  of  the  definition  of  p,  I(p,q,Q)  <  I(p  ,q,Q) 
and  from  the  right-hand  side  inequality  in  (26).  The  constraint  D  (p )  <  D  is  satisfied 
since  the  minimization  of  /(p  ,q  ,Q)  was  performed  under  the  constraint  D  (p )  <  D  . 

Condition  C0  is  critical  because  it  enables  us  to  consider  pdf’s  of  the  form 

p  (v  |m)  =  p(v-u)  and  thus  use  eq.  (10)  which  guarantees  that  D(p,Q)  is  upper- 

bounded  by  D  {p)  which  is  independent  of  Q  . 

We  now  proceed  to  the  final  stage  of  the  proof  of  Theorem  3.  First,  because  of  (28) 
condition  (29)  implies  that  R  >I(p  ,q\Q)  for  all  Q  in  Qw.  Furthermore  as  discussed 
above  D  (p)  <  D  is  satisfied  independent  of  Q  .  Thus-  Theorem  1  applied  for  p  —  p 
implies  that  for  the  ensemble  of  block  source  codes  of  rate  R  and  length  n  (for  which 
the  n  letters  of  each  codeword  are  chosen  from  the  user  alphabet  V  independently  and 
according  to  p  (where  p(v)  —  f[/q(u)p(v  |  u)\(du)  ,v£V],  while  the  [enR  }  codewords 
are  chosen  independently  and  with  equal  probability)  the  average  distortion  converges  to 
the  fidelity  level  D  exponentially  with  increasing  n  .  Since  this  is  true  for  all  q  in  the 
class  under  consideration  the  sufficiency  of  condition  (29)  is  established.  To  prove  its 
necessity,  notice  that,  according  to  the  usual  converse  source  coding  theorem  under  a 
fidelity  criterion  for  the  matched  case,  R  <  /(p  ,q  ,Q)  implies  that  fidelity  level  D  can 
not  be  reached  for  the  source  determined  by  Q,  which  is  a  member  of  the  aforemen¬ 
tioned  class.  This  completes  the  proof  of  Theorem  3. 


The  proof  of  eq.  (30)  stated  in  Remark  4  is  a  result  of  the  inequalities: 
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Hp.q.Q)  <  I(p,q-,Q)  <  I(p,q;Q )  <  I{p  ,q\Q) 

where  D  (p )  <  D  and  D  (p)  <  D  .  The  first  inequality  follows  from  an  application  of 
Jensen’  s  inequality,  the  second  inequality  was  proved  as  part  of  eq.  (28),  and  the  third 
inequality  follows  from  the  definition  of  p  as  the  minimizing  argument  for  I(p  ,q  ;Q). 

At  this  point  we  discuss  the  choice  of  the  operating  point  that  is  of  a  triplet  of  the 

form  ( p,p  ,~q ),  where  p  is  the  parameter  in  [-1,0]  involved  in  the  distortion  exponent 

E0 (p,p  ,~q  ,Q)  -  pR  ,  p  is  a  conditional  pdf  in  P^,  and  ~q  characterizes  the  ensemble  of 

block  source  codes.  Thus,  if  our  main  objective  is  to  operate  at  the  minimum  required 

rate  [see  (29)]  then  the  operating  point  should  be  (p,p  ,q)  where 

p=  arg  max  [E 0(p,p  ,q  ;Q  )  -  pR  ].  However,  if  our  main  objective  is  to  minimize  the 

p 

average  distortion,  then  Cp,~P  ,q  )  where  (p,p )  ==  arg  ma x[E0(p,p  ,q  ;Q)  -  pR]  for  p  in 

[p.p ) 

[-1,0]  and  p  satisfying  D{p)  <  D  should  be  the  operating  point  and  the  rate  R  should 
satisfy  R  >  I{~p  ,q  ;q)  instead  of  (29). 

Notice  that  in  contrast  to  the  corresponding  result  for  the  dual  robust  channel  cod¬ 
ing  problem  (see  [21])  the  operating  point  (~p,p  ,q)  and  the  source  determined  by  Q  do 
not  form  a  saddle-point  for  max  min[£'0(/j,p  ~q\Q)  -  pR  ].  This  due  to  the  fact  that  in 

( P.P  .9  ) 

general  it  is  not  true  that  E0{p,p  ,q  ,Q)  >  E0(p,p  ~q  \  Q  )  for  ~q  7^  q  . 

As  a  final  comment  for  Theorem  3,  notice  that  the  minimum  and  maximum 
involved  [minimizing  argument  p  ,  maximizing  argument  (7?,p  )]  exist,  since  the  functions 
I(p  ,q  ;Q)  and  [ E0(p,p  ,q  ;Q )  -  pR  ],  for  Q  «  X,  are  convex  in  p  and  concave  in  ( p,p  ), 
respectively,  />6E[-1,0]  and  p  belongs  to  a  convex  class. 

For  trellis  source  codes  a  similar  result  holds: 


Theorem  4-'  Under  the  assumptions  of  Theorem  3  condition  (29)  guarantees  that  for  the 
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ensemble  of  trellis  source  codes  of  constraint  length  K  and  rate  R  —  —InM  the  average 

2 

distortion  converges  to  the  fidelity  level  D  exponentially  with  increasing  K  for  all 

sources  in  the  class.  Furthermore,  if  we  define  ( p '  ,p '  )  =  arg  min  Dx(p,p  ,q  {Q),  where 

(p.p  ) 

-1  <  p  <  E 0{p,p  ,q  ,Q)/R  and  D(p)  <  D,  then  the  following  inequalities  hold  for  all 
Q  in  Qw  : 

Dk (p'  ,P'  ,q,Q)<  DK(p'  ,p'  ,q;Q)  <  DK(p,p  ,q;Q).  (33) 

Proof:  We  first  prove  the  inequalities  in  (33).  The  left-hand  inequality  in  (33)  follows 
from  the  inequality  in  (27)  applied  for  p  —  p'  and  p=p'  ,  and  the  fact  that 
DK(p,p,r,Q)  is  a  decreasing  function  of  E  0(p,p  ,q  ;Q).  The  right-hand  inequality  fol¬ 
lows  from  the  definition  of  (//  ,p  1  ). 

To  complete  the  proof  of  Theorem  4  notice  that  (29)  together  with  the  left-hald 
inequality  in  (28)  implies  that  R  > I (p  ,q  ;Q  )  for  all  Q  in  the  capacity  class.  Further¬ 
more,  any  p  which  satisfies  -1  <  p  <  E0(p,p  ,q,Q)/R  also  satisfies  [because  of  (27)] 
-1  <  p  <  E 0(p,p  ,q  ,Q)/R  ■  Consequently,  Theorem  2  applied  for  p  =p  guarantees 
that  for  the  ensemble  of  trellis  source  codes  (for  which  the  N  symbols  from  the  alphabet 
V  are  assigned  independently  and  according  to  p(v)  to  the  branches  of  the  trellis)  the 
average  distortion  converges  to  the  fidelity  level  D  exponentially  with  increasing  K. 
Since  this  is  true  for  all  Q  in  the  uncertainty  class,  the  proof  is  completed. 

As  discussed  at  the  end  of  the  proof  of  Theorem  3  the  choice  of  the  operating  point 
depends  on  our  objective.  For  trellis  source  codes,  if  our  main  objective  is  to  minimize 
the  required  rate,  then  the  operating  point  should  be  (p,p  ,q)  where 
p=  arg  min  [E0(p,p  ,q  ;Q  )  -  pR  ];  otherwise  the  operating  point  should  be  (p1  ,p  1  ,q) 

p 

where  ( p '  ,p'  )  is  defined  as  in  Theorem  4  and  the  rate  R  should  satisfy  R  >  I(p'  ,q-,q) 
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instead  of  (29). 

Finally  notice  that  all  the  minima  involved  in  Theorem  4  [the  minimizing  argu¬ 
ments  are  (p1  ,p  1  )  and  p\  exist,  since  the  functions  D^(p,p  ,q\Q)  and  D^(p,p  ,q  ;Q)  are 
convex  in  ( p,p  )  and  p,  respectively,  pG[-l,0]  and  p  belongs  to  a  convex  class. 
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m.  ROBUST  CODING  FOR  STATIONARY  GAUSSIAN  SOURCES 

In  this  section  we  describe  the  problem  formulation  and  the  results  for  robust 
minimax  coding  under  a  fidelity  criterion  for  stationary  Gaussian  sources  with  spectral 
uncertainty.  We  do  not  include  the  proofs  of  the  results,  since  they  basically  follow  the 
same  steps  as  the  proofs  of  the  corresponding  results  of  Section  II  and  involve  tecniques 
similar  to  those  used  there.  We  start  with  the  description  of  spectral  uncertainty  classes 
generated  by  Choquet  capacities. 

Suppose  that  U  —  V  —  (-00,00)  for  the  source  and  user  alphabets  and  the 
discrete-time  stationary  Gaussian  source  (SGS)  is  characterized  by  the  probability  den¬ 
sity  function  qn(u)  for  u  €  Un 

_n_  ± 

qn(u)  =  (2ir)  2  \±n  \  2  exp{-^-ur[£„p1M  }.  (34) 

In  (34)  |  |  denotes  the  determinant  of  a  matrix  and  is  a  correlatiom  matrix  of  order 
n  ,  which  because  of  the  stationarity  is  a  symmetric  Toeplitz  matrix,  associated  to  the 
spectral  density  4>(co),  cu€E[-7r,7r]. 

Suppose  that  the  spectral  density  <j>  is  the  Radon-Nikodym  derivative  of  a  spectral 
measure  <b  defined  on  sets  A  £B  where  B  is  the  cr-algebra  generated  by  subsets  of 
Cl  =  [— 7T, 7r] .  The  spectral  measure  <h  is  only  known  to  lie  in  the  convex  class  <f>w  defined 
by 

4»w  =  {<J>e<&  I  <3>(A  )  <  w{A),VA  GB  ;4>(0)  —  w  (D)}.  (35) 

In  (35)  is  the  class  of  all  spectral  measures  on  (fi,B)  and  w  is  a  2-alternating  capacity 
on  (fi,B).  We  impose  on  the  spectral  measures  <b  the  additional  constraint 
d>([-7T,7r])  —  w  ( [ — 7r, 7r] )  =  27ra2,  which  is  a  fixed  source  variance  constraint  and 
transforms  the  normalized  spectral  measures  <h(A  )/(27 r<r2)  into  probability  measures;  this 


-22- 


is  necessary  for  the  validity  of  the  Huber-Strassen  theory  of  least-favorability.  Another 
implication  of  the  fixed  variance  constraint  0({— 7r, 7t])  =  2rra2  is  that  <f>(oj)  has  a  finite 
supremurn  for  a;€[-7r,7r].  Let  A  denote  the  global  supremum  over  all  <j>  whose  spectral 
measures  4>  belong  to  4»w. 

All  the  results  about  Choquet  capacities  and  uncertainty  classes  of  probability 
measures  generated  by  them  presented  in  Section  II. A  are  also  valid  for  the  spectral 
uncertainty  classes.  Let  $  and  denote  the  Huber-Strassen  derivative  and  the  least- 
favorable  spectral  measure  in  this  case.  In  analogy  to  Section  III. A  we  assume  that 
«  X,  i.e.,  that  <h  is  absolutely  continuous  with  respect  to  X,  the  Lebesgue  measure 
on  fi  =  [— 7T,7r] .  This  assumption  is  satisfied  if  4>j  «  X  for  the  band  class  and  d>0  «  X 
for  the  e-contaminated  and  total  variation  neighborhood  classes. 

For  a  single-letter  mean-square-error  distortion  measure  d  (u  ,v )  —  (v  -u  )2, 
(u  ,v)£.U  X  V  the  average  distortion  constraint  takes  the  form: 

E{  |  |  U  -  V  |  |2}<nP,  (36) 

where  ||  ||  is  the  Euclidean  norm  of  the  n-dimensional  random  vector  U  -  V,  the  expec¬ 
tation  E  is  with  respect  to  the  n-dimensional  distribution  of  (U,V),  and  D  is  a  fixed 
fidelity  level. 

Suppose  that  in  the  presence  of  uncertainty  about  4>  the  user  mistakenly  assumes 
that  4>  is  the  spectral  measure  governing  the  statistics  of  the  discrete-time  stationary 
Gaussian  source.  Let  <f)  denote  the  R-N  derivative  of  d>  and  <j>  denote  the  spectral  den¬ 
sity  of  4>,  that  is,  we  assume  that  i  «  X,  where  X  is  the  Lebesgue  measure  on  U.  Let 
Qn  and  Qn  ( Qn  «X)  be  the  n-th  order  probability  measures  induced  by  the  spectral 
measures  <t>  and  <1>,  respectively. 
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The  above  situation  is  characterized  by  mismatch  as  in  the  case  desribed  in 
Theorem  1.  Therefore  we  can  apply  Theorem  1  to  this  special  case.  For  the  evaluation 
of  the  mismatch  mutual  information  and  mismatch  distortion  exponent  functions  it  is 
now  advantageous  to  follow  the  technique  [24,  Section  4.5.2]  and  make  the  problem 
equivalent  to  that  of  n  independent  zero-mean  Gaussian  DMS’s.  This  involves  a  unitary 
transformation  of  u  and  v  associated  with  <i>n  which  preserves  the  mutual  information 
relationships  and  the  mean-square-error  (MSE)  distortion  constraint. 

Furthermore,  because  of  the  Gaussian  statistics  we  restrict  attention  to  auxiliary 
conditional  pdf’s  pn  (v  |  u  )  of  the  form 

Pn(M  I  M )  =  (27tTn/2  |  AnRn  |  2{exp{— ]-(v-An  u)T  [An  Rn}~\v-An  u)}  (37) 

where  An  —  diag  (a  1,a2,...,a„ )  (a,-  >0  for  i  =l,2,...,n)  is  associated  with  a  spectral 
density  a( w),  ir,ir\,  and  the  n-th  order  Toeplitz  matrix  Rn  is  associated  with  spectral 
density  r(u),  wG [— 7r,7r].  We  considered  the  matrix  An  instead  of  the  identity  matrix  X, 
(i.e. ,  v~Anu  instead  of  v-u)  because  pn  ( v  \  u ),  defined  by  (37)  for  An  =  Xt  and  satis¬ 
fying  (36),  is  too  restrictive  to  allow  for  the  minimization  of  the  mutual  information 
function. 

Once  the  SGS  has  been  decomposed  to  n  independent  Gaussian  DMS’s  we  can 
apply  the  theory  of  [24,  Section  4.5.2],  (34),  (37),  the  definitions  (13)  and  (15a)  -  (15b)  of 
Theorem  1  and  the  discrete-time  version  of  the  Toeplitz  Distribution  theorem  [25]  to  to 
put  the  asymptotic  (in  the  limit  of  large  n)  mismatch  mutual  information  function  and 
mismatch  distortion  exponent  in  the  form: 


/ (a  ,r  — 


a  (W)X(w) 
r(oj) 


a  1  }\(doj) 

a  ( ui)~(f)((jj)+r  (w) 


(38) 
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E0(p,a  ,r  ~4>\<i>) 


— o>"  1+  ° 

4  IT  J)r|  (l+p)r  (o>) 


+  In  1+ 


pa  lX(rfw) 

(l+p)[a  (ft/W(a;)+r  («)]  J  j 


and  the  average  distortion  constraint  in  the  form: 

D(a,r~<b)  =  —  f  {[o(w)  -  l]2#u)  +  a(u)r  (u>)}\(du)  <  D  ,  (40) 

2  7T  J-jr 

Notice  that  as  explained  in  [24,  Section  4.5.2]  for  the  matched  case  the  quantities  above 
depend  on  <j> ,  the  R-N  derivative  of  <E>,  and  not  on  <E>  itself;  they  only  depend  on  ~<j> 
because  we  assumed  that  i«\.  For  the  quantities  above  we  have  that 
/(a  ,r  ~<f>',<f>)  >  0  and  E0(p,a  ,r  <  0  for  0  <  p  <  1  and  all  pairs  (a,r).  These  ine¬ 
qualities  can  be  proved  in  the  same  way  as  the  corresponding  inequalities  I{p  ~q  \Q)  '>  0 
and  E0(p,p  ,~q  )Q  )  <  0  in  Section  II. B. 

Next  we  consider  the  pair  (a , r )  which  minimizes  / (a  ,r  the  asymptotic 

mutual  information  function,  for  the  matched  case  (</)— (j)).  This  pair  has  been  shown  in 
[24,  Section  4.5.2]  to  be  defined  in  terms  of  a  parameter  9  as: 


1  -  0/<f>(oj);  ifl?  <  <f>(oj) 
0;  if  <  0 


r  (w)  —  0  ;  we[-7r,7r]. 


The  parameter 


0  is  determined  by  the  condition  D  (a  ,r  ,</>)  — -  D  or  equivalently. 


D(0,6)  —  — —  f  min  f6,<f>{oj)}\(du>)  —  D. 

2? r 


(42) 
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The  parameter  ~6  lies  in  the  range  [0,A]  where  A  is  the  essential  supremum  of  ~4>  (see 
[24]).  Similarly,  the  condition  D  (a  ~r  ,4>)<D  is  transformed  into 

D(0,4>)  =  —  f  _6(u)\(duj)  +  L  +  [^77 <  D.  (43) 

k  2tt  J(m<o}  J(e<m ^(w) 

We  now  state  the  main  corresponding  to  Theorem  1  result  for  stationary  Gaussian 
sources: 


Theorem  5:  Consider  a  discrete-time  stationary  Gaussian  source  with  n-th  order  distribu¬ 
tion  Qn  induced  by  the  spectral  measure  <I>(ct/),  wG[— tt.tt],  and  a  n-th  order  conditional 
pdf  ~pn(v  |  u)  for  ( u,v)eUn  XV"  of  the  form  (36)  induced  by  the  spectral  densities 
a(co)  and  ~r(u),  w€[- 7r,7r],  defined  by  (41a)-(41b),  where  the  parameter  9  satisfies  the  aver¬ 
age  distortion  constraint  (42).  Assume  that  for  a  given  source  sequence  uEUn  the 
source  encoder  chooses  the  codeword  v_EVn  which  minimizes  |  |  v_~u  \  \  .  Consider  the 
ensemble  of  block  source  codes  of  length  n  and  rate  R  whose  codewords  are  chosen 
independently  with  equal  probability  and  the  n  letters  of  each  codeword  are  chosen 
from  the  user  alphabet  V  according  to  ~p  {v )  =  /  qn  {u  )pn  (v  \  u  )\„  (du  ),  v  &  Vn  , 
where  the  pdf  qn{u)  is  induced  by  the  spectral  density  ~<j>.  Then,  if 

D  (6,4,)  <  D  (1 K4>),  (44) 

where  the  two  quantities  are  defined  by  (42)  -  (43),  and  the  rate  R  satisfies 

R  >l(9j>;4>).  (45) 


where, 


l(o?l>;4>)  = 


—  f  _  {  In 

47T 


0 


~4>{u) 


[4>(pj)  ~~4> Ml  >X(</cu),(46) 

J 


then  the  average  distortion  Dc  over  the  enseble  of  block  source  codes  is  for  large  n 
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upperbounded  by 

D  <D  +\/3A  exp|— i-n  [E  0(p,9~4>;<f>)-pR  }  |,  (47) 

where  A  is  the  global  supremum  of  (j)  for  the  class  of  (35)  and  for  p€[-l,0): 

E o(p,0~(f>;(f>)  =  J  _  /  pin  [  +  In  [<^(w)-^(o;)]\\(tfa;). 

[  (i +P)6  J  [(i +P)m2  J  / 

(48) 

For  of  this  theorem  to  be  valid  it  is  required  that  I  {9~<f>',<f>)> 0  and  E  0(p,9~<j>;<j))< 0  for  all 
p  in  [-1,0],  These  inequalities  are  satisfied  for  all  nontrivial  9,  ~cj>,  and  <f>. 

Remark  5:  The  functions  I(9~4>-,<f))  and  E 0(p~9~<j>-,<f>)  represent  the  mismatch  rate  distortion 
function  and  the  mismatch  distortion  exponent,  respectively. 

The  corresponding  result  for  trellis  source  codes  is: 


Theorem  6:  Under  the  assumptions  of  Theorem  5,  consider  the  ensemble  of  trellis  codes 
of  constraint  length  K  and  rate  R  =  —  In  M  satisfying  (45)  which  is  generated  by 

assigning  N  letters  from  the  user  alphabet  V  according  to  the  pdf 
~Pn(h)  =  L.  Qn {Hl)Pn(h  |  w  )X/v(dw  ),  v£VN .  Then,  provided  that  eq.  (44)  is  satisfied, 
the  average  distortion  Dc  over  the  ensemble  of  trellis  source  codes  is  upperbounded  by 


DK(p,9~<j>-,<f>) 


, -  ~(K-Dp 

s/3AM2 _ 

,  ,-1  E0(p.9~<t>-,<t>)/R  -  p\ 


where  -1  <  p  <  E  0(p,Q~4>\<j>)/ R  and  the  parameter  9  is  determined  by  (42). 

In  the  presence  of  uncertainty  about  the  statistics  of  the  source,  in  particular  in  the 
presence  of  spectral  uncertainty  within  the  classes  defined  by  (35),  the  goal  is  to  choose  ~<j> 
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involved  in  Theorems  5  and  6  above  so  that,  if  the  code  rate  for  the  codes  of  the  ensem¬ 
bles  considered  in  these  two  theorems  is  larger  than  a  critical  rate,  then  the  average  dis¬ 
tortion  converges  asymtotically  to  D  for  all  sources  in  the  class. 

The  result  corresponding  to  Theorem  3  is: 

Theorem  7:  Suppose  the  spectral  measure  <h  (let  <j>  =  d  4>/d  X)  belongs  to  a  class  of  the 
form  (35)  and  4>  (with  «  X  and  $  ==  d&/d\)  is  the  element  of  the  class  singled  out 
by  Lemma  1.  Then  (#,$;$>)  where  $  satisfies  D  (&,$)  =  D  [apply  (42)  for  ~(j>  —  $}  is  a  sad¬ 
dle  point  for  min  max  I (0~<p‘,<p)  under  the  fidelity  constraint  D  (0,<f>)<D  ,  i.e. , 

(«,»  <t> 

I0M)  <  I0,fcto<I{9,4$),  (50) 

where  D0,4>)<D  and  D(0,<j>)<D  for  all  <j>  =  d&/d\  with  in  4>w;  it  is  also  a  least- 
favorable  operating  point'  for  the  distortion  exponent, that  is, 

EQ{pfi, >  E0(p,t), $;<}>)  (51) 

for  all  p  in  [-1,0].  Finally  the  condition 

R  >  10,<j>;<l>)  (52) 

is  sufficient  and  necessary  to  guarantee  that  for  the  ensemble  of  block  source  codes  of 
length  n  and  rate  R  (described  in  Theorem  5,  just  set  ~<j>  —  ty)  the  average  distortion 
converges  to  the  fidelity  level  D  exponentially  with  increasing  n  for  all  sources  in  the 
class. 


Remark  6:  The  quantity  /($,$;$)  given  parametrically  by 


m^)  =  -Lr  lnMX((iW) 

47T  j{0<<P(v)}  0 


(53a) 
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D  (D,<}>)  =  —  f  min{d,$(aj)}\(d  w)  =  D  (53b) 

4tt  J~n 

represents  the  rate  distortion  function  of  the  class  determined  by  (35).  For  the  same 

class  of  spectral  measures  this  function  was  derived  in  [12]  directly  from  the  definition 

supRs(D).  No  coding  theorems  were  derived  in  [12].  In  contrast,  our  emphasis  in  this 
ses 

Section  was  on  the  mismatch  and  the  minimax  robust  theorems  for  block  and  trellis 
source  coding. 

The  discussion  for  the  choice  of  the  operating  point  is  similar  to  that  which  fol¬ 
lowed  the  proof  of  Theorem  3  and  we  do  not  repeat  it  here.  The  corresponding  result 
for  trellis  codes  is: 

Theorem  8:  Under  the  assumptions  of  Theorem  7  condition  (52)  guarantees  that  for  the 

ensemble  of  trellis  source  codes  of  constraint  length  K  and  rate  R  the  average  distortion 

converges  to  the  fidelity  level  D  exponentially  with  increasing  K  for  all  sources  in  the 

class.  Furthermore,  if  we  define  ( p '  ,6'  )  —  arg  min  DK{p,9,<i>)<j>)  where 

(P.6) 

-l<p<E o{p,0,<j)',(l>)/R  and  D  (0,^))<D  ,  then  the  following  inequalities  are  true. 

DK( p'  <  Dk  (p1  , 6 1  ,(j>]<f>)  <  DK(p,B, $■,<!>).  (5<t) 

It  should  be  noted  that  the  rate  distortion  function  for  the  class  of  discrete-time 
SGS’s  with  spectral  uncertainty,  which  is  given  parametrically  by  (53a)  -  (53b),  can  also 
serve  as  an  upper  bound  for  the  rate  distortion  function  of  class  of  discrete-time  station¬ 
ary  ergodic  non-Gaussian  sources  with  the  same  spectral  characteristics.  This  is  the 
case,  since  the  Gaussian  source  is  known  to  have  ([24,  Section  4.6.2])  the  largest  rate  dis¬ 
tortion  function  among  the  stationary  ergodic  sources  with  the  same  spectral  characteris¬ 


tics. 
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Fin  ally,  all  the  results  of  this  section  can  be  extended  to  continuous-time  stationary 
Gaussian  bandlimited  (e.g.,  with  spectral  densities  defined  on  fi  —  [-w0,a>0])  sources. 
Since  Huber-Strassen  derivatives  of  capacities  with  respect  to  <r-flnite  (and  not  finite) 
measures  can  be  constructed  [22,  Chapter  IVj,  these  results  can  be  extended  to  non- 
bandlimited  [that  is,  with  spectral  densities  defined  on  fi  =  (-00,00)]  sources. 
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IV.  SUMMARY  AND  CONCLUSIONS 

We  have  addressed  the  problem  of  minimax  robust  coding  for  sources  with  uncer¬ 
tainty  in  their  statistical  description.  First,  the  mismatch  source  coding  problem  under 
a  fidelity  criterion  was  formulated  and  the  appropriate  coding  theorems  were  established. 
Then,  for  uncertainty  classes  determined  by  2-alternating  capacities  coding  theorems 
were  proved  for  discrete  memoryless  sources  with  uncertainty  in  the  probability  distribu¬ 
tion  and  single-letter  difference  distortion  criteria,  and  for  stationary  Gaussian  sources 
with  spectral  uncertainty  and  the  mean-square-error  distortion  criterion.  It  was  esta¬ 
blished  that  there  exist  random  block  source  codes  and  random  trellis  source  codes  such 
that  the  average  distortion  converges  to  the  prescribed  fidelity  level  exponentially  with 
increasing  block  length  or  constraint  length,  respectively,  for  all  sources  in  the  class,  pro¬ 
vided  that  the  code  rates  are  larger  than  a  critical  rate.  The  rate  distortion  function  for 
the  class  of  sources  and  the  distortion  exponent  were  evaluated.  These  quantities,  as 
well  as  the  ensembles  of  random  block  and  trellis  source  codes  were  characterized  in 
terms  of  a  Radon-Nikodym  type  derivative  between  the  upper  measure  of  the  uncer¬ 


tainty  class  and  a  Lebesque  measure  defined  on  the  appropriate  set. 
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